Exploratory Data Analysis of Starbucks’ and Dunkin Donuts’ Nutritional Information

Final Project
Data Science 1 with R (STAT 301-1)

Author

Rohan Krishnamurthi

Published

December 8, 2023

Introduction

In this report, I am inquiring about the nutritional information of Starbucks’ and Dunkin Donuts’ food and drink items. The first dataset I am analyzing contains relevant nutritional information of all beverages available at Starbucks. The second dataset concerns all of the food items available at Starbucks. The third dataset concerns the nutritional information of all products, both food and drink items, available at Dunkin Donuts.

Motivation for Research

I chose these datasets because I normally enjoy both Starbucks’s and Dunkin Donuts’ offerings. However, it must be noted that many students and I are currently boycotting Starbucks due to the company’s decision to not support for the Palestinian people given the humanitarian crisis they are currently experiencing. While I enjoy their products, I currently will not support the company due to this issue.

Nevertheless, I have had plenty of items from both their drink and food menus and I feel knowledgeable enough to work with data concerning both. Also, I really enjoy making and trying intricate coffee and tea drinks, so I’d be happy to work with data regarding Starbucks and Dunkin Donuts. I decided to include the food information as well as a challenge to work with multiple sets of data. Working with data from both Starbucks and Dunkin Donuts allows more depth in the exploration of nutrition information.

Research Questions

The overarching research question for this EDA is what can one consume at either Starbucks or Dunkin Donuts in order to maintain a healthy, nutritious diet. There are many ways to address this question. Some initial querries are how do the calories, fat, protein and sugar vary by drink type or by food item at either retailer. To involve both food and drink items, I propose investigating which combination of food and drink items is most nutritious, at either Starbucks or Dunkin Donuts. A related question I have is which drink or food items are least nutritious and should be generally avoided. Lastly, I propose questioning how nutritious food and drinks vary from Starbucks and Dunkin Donuts, and which restaurant has the most nutritious options overall.

Data Overview and Quality

The three raw datasets were then copied into new datasets, which were then cleaned and prepared for analysis. The first raw dataset, entitled “starbucks.csv”, contains 18 variables and 242 observations, corresponding to 242 different drinks. Three of the variables contain categorical data, corresponding to drink identifiers, and the remaining 15 variables contain numerical data, corresponding to nutritional information. There are some observations (specifically, caffeine) with missing nutritional values, which contain NA instead of the actual numerical values.

The second raw dataset, stored as “starbucks_menu_nutrition_food_redo.csv”, contains 6 variables and 113 observations, corresponding to 113 different food items. One variable contains categorical data, corresponding to the food name. The remaining five variables contain numerical data, corresponding to the nutritional information, such as calories, fat, carbohydrates, fiber, and protein content. There are no missing values in this data set.

The third raw dataset, stored as “dunkindonutsnutrition.csv”, contains 13 variables and 790 observations, corresponding to 780 different products sold by Dunking Donuts. All variables are stored as character vectors. Two variables concern the item category and name, while the remaining variables concern nutritional information. There are no missing values in this data set either.

Starbucks Data Cleaning and Preparation

In the 0a_starbucks_data_preparation R script, the dataset “starbucks_menu_nutrition_food_redo.csv” was read in, copied as starbucks_food_data, and then cleaned. The first issue with the raw data was that the variable names had spaces, which could make it difficult ot select these variables. The spaces were replaced with underscores using the names() function. Also, the variable names were be made into lowercase names for consistency.

The dataset “starbucks.csv” was read in, copied as starbucks_drinks_data, and then cleaned as well. The data frame had the same issue in that the column names had uppercase letters and spaces. This was addressed using the names() function as well. Additionally, in the column names “Total Fat g”, “Dietary Fiber g”, and “Total Carbohydrates g”, the initial word was be removed so these variables would have identical names to the starbucks_food_data dataset. This made facilitated joining observations in both data frames.

In the beverage_prep column of starbucks_drink_data, the size of beverage was missing in some observations but could be deduced from the size listed of the previous (or second previous) beverage. The size was added to this column using a “for loop” to iterate through each observation and if statements to identify the size and then add it to the observation.

Additionally, the information in the beverage_prep column in starbucks_drinks_data was used to create new variables– one for milk type and one for size. First, the column milk_type was developed bt extracting the type of milk from the beverage_prep column. Drinks without any milk intentionally had NA in this column. A for loop was used to iterate through each observation in the beverage_prep column, and if statements were used to identify the type of milk and add it to the milk_type column for each observation. The column size was added by extracting the size of drink from the beverage_prep column. A for loop can was to iterate through each observation in the beverage_prep column, and if statements were used to identify the beverage size and add it to the size column for each observation.

Finally, a new data frame entitled starbucks_all_data was created by joining the two existing data frames with the merge() function. The beverage and food_item variables were renamed to item, so the data frames could share this variable. Then, the two data frames were combined by the variables item, calories, fat_g, carb_g, fiber_g, and protein_g, having selected these variables upon entering the data sets into the merge() function.

Dunkin Donuts Data Cleaning and Preparation

In the 0b_dunkin_data_preparation R script, the dataset “dunkindonutsnutrition.csv” was read in, copied as dunkin_donuts_data, and then cleaned. First, the variable named were changed with the names function to remove parentheses and spaces. Additionally, the names were made lowercase for consistency.

A new variable, item_type was created as a factor. The factor “drink” was assigned to drinks (based on the category variable) and the factor “food” was assigned to food items. Variables corresponding to nutritional information were converted into numeric vectors, as they generally contained numeric values. Moreover, a new factor variable, size, was created to identify the size of each product. The size was extracted from the item name using the grepl() function. Items that were not given a size or milk_type were assigned a factor of “not applicable”. Similarly, a new factor variable, milk_type, was created to identify the type of milk in each product, if applicable.

Finally, the data was filtered into two new datasets: dunkin_donuts_drinks_data, containing the drink observations; and dunkin_donuts_food_data, containing the food observations.

From the R scripts, the cleaned datasets were downloaded as csv files using the write_csv() function, and then they were accessed using the read_csv() function.

Explorations

Analysis of Drinks

Nutritional Content by Drink Category

Sugar, Fat, and Calories Content

The first question I sought to answer was which drinks are considered the healthiest. I initially approached this quesiton by comparing the fat, sugar, and calorie content of drinks. In these categories, a beverage with a lower amount of each of these nutrients is considered healthier. For this analysis, I compared drink categories, rather than the individual drinks themselves, as it made it easier to make generalizations about drinks.

The following table and three figures depict the fat, sugar, and calorie content of types of drinks sold by Starbucks.

Starbucks Drinks Nutritional Info
Drink Category Average fat content (g) Average sugar content (g) Average calories
Classic Espresso Drinks 3 17 140
Coffee 0 0 4
Frappuccino Blended Coffee 3 57 277
Frappuccino Blended Creme 2 48 233
Frappuccino Light Blended Coffee 1 32 162
Shaken Iced Beverages 0 26 114
Signature Espresso Drinks 5 39 250
Smoothies 2 37 282
Tazo Tea Drinks 3 30 177

ANALYZE.

The following table and three figures depict the fat, sugar, and calorie content of types of drinks sold by Dunkin Donuts.

Dunkin Donuts Drinks Nutritional Info
Drink Category Average fat content (g) Average sugar content (g) Average calories
Cold Brew Coffee 5 8 80
Coolatta 2 96 427
Dunkin Refreshers 5 60 320
Frozen Coffee 13 118 640
Hot Americano 0 0 8
Hot Cappuccino 3 39 223
Hot Chocolate 13 48 368
Hot Coffee 4 27 162
Hot Latte 6 43 274
Hot Macchiato 4 39 233
Iced Americano 0 0 8
Iced Cappuccino 3 39 223
Iced Coffee 3 24 142
Iced Latte 6 43 273
Iced Macchiato 3 39 226
Iced Tea 0 25 109

ANALYZE.

Protein, Calcium, and Vitamin A and C Content

I then address this questioned by investigating which beverages at each retailer had the highest amount of protein, calcium, and vitamins. In these categories, a beverage with a high content of each of these nutrients is considered more healthy.

The Starbucks drinks data contains information regarding each drink’s protein, calcium, and vitamin A and C content. The content of each of these nutrients is compared in the following table and graphs. A boxplot was used to compare protein content, similar to the other nutrients. However, frequency polygons were used to compare calcium, vitamin A, and vitamin C content due to a lack of data points other than 0%.

Starbucks Drinks Nutritional Info, continued
Drink Category Average protein content (g) Average calcium (%DV) Average vitamin A (%DV) Average vitamin C (%DV)
Classic Espresso Drinks 9 0 0 0
Coffee 1 0 0 0
Frappuccino Blended Coffee 4 0 0 0
Frappuccino Blended Creme 4 0 0 0
Frappuccino Light Blended Coffee 4 0 0 0
Shaken Iced Beverages 1 0 0 0
Signature Espresso Drinks 10 0 0 0
Smoothies 17 0 0 1
Tazo Tea Drinks 7 0 0 0

ANALYZE.

The Dunkin Donuts drinks data only contains information regarding each drink’s protein content. The protein content of each type of drink sold at Dunkin is compared in the following table and boxplot.

Dunkin Donuts Drinks Nutritional Info, continued
Drink Category Average protein content (g)
Cold Brew Coffee 1
Coolatta 2
Dunkin Refreshers 3
Frozen Coffee 7
Hot Americano 0
Hot Cappuccino 7
Hot Chocolate 3
Hot Coffee 3
Hot Latte 10
Hot Macchiato 8
Iced Americano 0
Iced Cappuccino 7
Iced Coffee 3
Iced Latte 10
Iced Macchiato 7
Iced Tea 0

ANALYZE.

Nutritional Content by Milk Type

Another question I sought to answer is how the nutrition content of each drink varies by milk type. When people order drinks from coffee shops, there are many different ways they can customize their drink to their liking. Thus, I found it important to analyze nutrition by milk preference. For this analysis, I measured the fat, sugar, calories, and then protein of drinks having grouped them by milk type. I created boxplots to depict how these nutrients vary by each type of milk.

The following table and boxplots depict the average fat, sugar, calories, and protein for drinks of each type of milk at Starbucks. Observations with no milk were excluded from this analysis.

Starbucks Drinks Average Nutritional Info by Type of Milk
Milk Type Fat (g) Sugar content (g) Calories Protein content (g)
2% Milk 6 31 218 10
Nonfat Milk 1 36 190 8
Soymilk 4 32 207 7
Whole Milk 5 56 284 4
NA 0 17 75 0

ANALYZE.

The following table and boxplots reports the average fat, sugar, calories, and protein for drinks of each type of milk at Dunkin Donuts. Observations with no milk were excluded from this analysis.

Dunkin Donuts Drinks Average Nutritional Info by Type of Milk
Milk Type Fat (g) Sugar content (g) Calories Protein content (g)
cream 13 48 332 4
skim 1 44 229 8
whole 7 44 283 8

Due to large overlap of boxes in the boxplots, as well as an abundance of outliers, I added density plots to better distinguish the nutritional information of each type of milk.

ANALYZE.

New metrics to assess healthiness of drinks

In the previous section, I analyzed which typs of beverage had high (or low) contents of specific nutrients. There are more nuanced ways to assess the “healthiness” of a drink. In this section, I am to incorporate multiple variables to develop new ways to analyze healthiness.

Beverages rich in protein

First, I sought to determine which types of beverages were richest in protein while lowest unhealthy quantities, such as calories and fat. I compared the ratio of protein to calories for each drink type. Nevertheless, protein content can also be compared to sugar, fat, or other quantities for different metrics.

In the following table and scatterplot, I compare the ratio of grams of protein to calories for types of drinks at Starbucks. Due to difficulty in distinguishing data points in the scatterplot, I added faceted scatterplots as well.

Starbucks Protein to Calories
Drink Category Protein content (g) Calories Ratio of protein (g) to calories
Classic Espresso Drinks 9 140 0.0643
Coffee 1 4 0.2500
Frappuccino Blended Coffee 4 277 0.0144
Frappuccino Blended Creme 4 233 0.0172
Frappuccino Light Blended Coffee 4 162 0.0247
Shaken Iced Beverages 1 114 0.0088
Signature Espresso Drinks 10 250 0.0400
Smoothies 17 282 0.0603
Tazo Tea Drinks 7 177 0.0395

ANALYZE

In the following table and scatterplot, I compare the same quantities for drinks at Dunkin Donuts. A single scatterplot and faceted scatterplots were included as well.

Protein to Calories, Dunkin
Drink Category Protein content (g) Calories Ratio of protein (g) to calories
Classic Espresso Drinks 9 140 0.0643
Coffee 1 4 0.2500
Frappuccino Blended Coffee 4 277 0.0144
Frappuccino Blended Creme 4 233 0.0172
Frappuccino Light Blended Coffee 4 162 0.0247
Shaken Iced Beverages 1 114 0.0088
Signature Espresso Drinks 10 250 0.0400
Smoothies 17 282 0.0603
Tazo Tea Drinks 7 177 0.0395

ANALYZE

Beverages rich in vitamins

Another way I sought to assess how healthy beverages were was by assessing their overall vitamin and mineral contents. Unfortunately, this information was not available for the Dunkin Donuts drinks data. The Starbucks drinks data contains information regarding vitamin A, vitamin C, calcium, and iron. For this analysis, I took the sum of the listed percent daily value of each of these nutrients, and divided by four. This calculation results in an estimation of each drink type’s percent daily value of all vitamins and minerals that one should consume in a day (i.e. how much of your daily vitamin and mineral content are you getting through each drink). The following table contains information regarding the average vitamin/mineral percent daily value for each drink type and Starbucks.

Average Percent Daily Value of Vitamins and Minerals, Starbucks
Drink Category Vitamin A (%DV) Vitamin C (%DV) Calcium (%DV) Iron (%DV) Average %DV of Vitamins and Minerals
Classic Espresso Drinks 0 0 0 0 0.00
Coffee 0 0 0 0 0.00
Frappuccino Blended Coffee 0 0 0 0 0.00
Frappuccino Blended Creme 0 0 0 0 0.00
Frappuccino Light Blended Coffee 0 0 0 0 0.00
Shaken Iced Beverages 0 0 0 0 0.00
Signature Espresso Drinks 0 0 0 0 0.00
Smoothies 0 1 0 0 0.25
Tazo Tea Drinks 0 0 0 0 0.00

ANALYZE

Analysis of food data

Similar to the drinks, I then sought to investigate which food items offered are considered the healthiest. The data regarding the beverages offered at Starbucks and Dunkin Donuts was largely similar and contained the same nutritional information. However, the two food datasets, starbucks_food_data and dunkin_donuts_food_data have far less similarities. The Starbucks food data contains less nutrient information and does not categorize the food items. The Dunkin food data is more detailed and labels the food items at Dunkin Donuts by type. Thus, for this section of the EDA, the analysis was split for the starbucks_food_data and dunkin_donuts_food_data.

Nutritional information of Starbucks food items

First, I identified the healthiest Starbucks food items in five separate categories: highest in protein, highest in fiber, lowest in fat, lowest in carbohydrates, and lowest in calories. The top five food items in each category are listed in the tables below.

Highest Protein Foods, Starbucks
item protein_g
Turkey Pesto Panini 34
Roasted Turkey & Dill Havarti Sandwich 32
Turkey & Havarti Sandwich 29
Za'atar Chicken & Lemon Tahini Salad 27
Chicken & Quinoa Protein Bowl with Black Beans and Greens 27
Spicy Chorizo Monterey Jack & Egg Breakfast Sandwich 26
Ancho Chipotle Chicken Panini 26
Turkey & Fire-Roasted Corn Salad 24
Smoked Turkey Protein Box 24
Slow-Roasted Ham Swiss & Egg Breakfast Sandwich 24
Highest Fiber Foods, Starbucks
item fiber_g
Lentils & Vegetable Protein Bowl with Brown Rice 21
Za'atar Chicken & Lemon Tahini Salad 11
Strawberries & Jam Sandwich 10
Green Goddess Avocado Salad 10
Chicken & Quinoa Protein Bowl with Black Beans and Greens 9
Multigrain Bagel 8
8-Grain Roll 7
Sprouted Grain Vegan Bagel 7
Roasted Carrot & Kale Side Salad 7
Turkey & Fire-Roasted Corn Salad 7
Least Fat Foods, Starbucks
item fat_g
Seasonal Fruit Blend 0.0
Cinnamon Raisin Bagel 1.0
Plain Bagel 1.5
Classic Whole-Grain Oatmeal 2.5
Hearty Blueberry Oatmeal 2.5
Berry Trio Yogurt 2.5
Fresh Blueberries and Honey Greek Yogurt Parfait 2.5
Frappuccino Cookie Straw 3.0
Everybody's Favorite - Bantam Bagel (2 Pack) 3.5
Everything Bagel with Cheese 3.5
Lowest Carbohydrate Foods, Starbucks
item carb_g
Organic Avocado (Spread) 5
Justin's Classic Almond Butter 6
Cauliflower Tabbouleh Side Salad 7
Garden Greens & Shaved Parmesan Side Salad 9
Sous Vide Egg Bites: Bacon & Gruyere 9
Justin's Chocolate Hazelnut Butter 12
Sous Vide Egg Bites: Egg White & Red Pepper 13
Everybody's Favorite - Bantam Bagel (2 Pack) 14
Frappuccino Cookie Straw 14
Petite Vanilla Bean Scone 18
Lowest Calorie Foods, Starbucks
item calories
Frappuccino Cookie Straw 90
Organic Avocado (Spread) 90
Seasonal Fruit Blend 90
Everybody's Favorite - Bantam Bagel (2 Pack) 100
Petite Vanilla Bean Scone 120
Cauliflower Tabbouleh Side Salad 130
Chocolate Cake Pop 160
Classic Whole-Grain Oatmeal 160
Chewy Chocolate Cookie 170
Garden Greens & Shaved Parmesan Side Salad 170

Summary of major findings

In this section of the EDA, I compared nutritional information of drinks and food items at Starbucks and Dunkin Donuts.

Conclusions

References

Arvidsson, J. (2023, September) Dunkin’ Donuts’ Nutrition: Dunkin’ Donuts’ Menu Nutrition, Micronutrients, and Calorie Information. Kaggle. https://www.kaggle.com/datasets/joebeachcapital/dunkin-donuts-nutrition

Sanchez-Arias, R. (2023, October 19) Sample Datasets: A collection of datasets from multiple sources to be used for demonstrations in data science courses. GitHub. https://github.com/reisanar/datasets

Starbucks. (2017) Nutrition factors for Starbucks: Nutrition information for Starbucks menu items, including food and drinks. Kaggle. https://www.kaggle.com/datasets/starbucks/starbucks-menu/data